5  Comparison of macroinverbrate biodiversity

Now that we’ve generated all this data we can use it to address our first research objective.

Research Objective 1:

Compare species richness and diversity for selected headwater streams across spring, summer, and fall.

Keep in mind that we have data sets from two locations (Rand Brook and Schoolhouse Brook) for three different sampling seasons from multiple years. Our sampling windows are based on the NEON protocol for sampling macroinvertebrates as their long-term monitoring stations.

You can download the data sets from our shared google folder.

5.1 Get set up

Before we can get started let’s load our packages.

# read libraries
library(readr)
library(janitor)
library(lubridate)
library(tidyr)
library(dplyr)
library(glue)
library(ggplot2)
library(patchwork)
library(knitr)

Now, we can read in our data sets.

rand_2210 <- read_delim("data/macros_RAND_2022-10-19.tsv", delim = "\t") %>%
  clean_names()

rand_2305 <- read_delim("data/macros_RAND_2023-05-30.tsv", delim = "\t") %>%
  clean_names()

rand_2307 <- read_delim("data/macros_RAND_2023-07-07.tsv", delim = "\t") %>%
  clean_names()

rand_2310 <- read_delim("data/macros_RAND_2023-10-18.tsv", delim = "\t") %>%
  clean_names()

rand_2407 <- read_delim("data/macros_RAND_2024-07-09.tsv", delim = "\t") %>%
  clean_names()

rand_2410 <- read_delim("data/macros_RAND_2024-10-02.tsv", delim = "\t") %>%
  clean_names()

schb_2210 <- read_delim("data/macros_SCHB_2022-11-02.tsv", delim = "\t") %>%
  clean_names()

schb_2305 <- read_delim("data/macros_SCHB_2023-05-30.tsv", delim = "\t") %>%
  clean_names()

schb_2410 <- read_delim("data/macros_SCHB_2024-10-23.tsv", delim = "\t") %>%
  clean_names()

First, we will want to look at the number of individuals in each taxonomic group and calculate their relative abundances and compare species richness and diversity across the two locations.

Let’s start with the data set from our visit to Rand Brook this semester.

We can figure out the richness by determining the number of individuals in each of the different orders and family.

kable(
  rand_2410 %>%
    group_by(order) %>%
    count()
)
order n
Coleoptera 18
Diptera 5
Ephemeroptera 5
Hemiptera 1
Megaloptera 4
Odonata 27
Plecoptera 11
Trichoptera 26

This is helpful information but since our sample sizes differ among locations we will want to determine the relative abundance for each location.

We can do this by adding a column using mutate() and then telling R to divide the number of individuals in a certain group by the total number of individuals sampled at that location for each row in the data frame.

kable(
  rand_2410 %>%
    group_by(order) %>%
    count() %>%
    ungroup() %>%
    mutate(rel_abund = round(n/sum(n)*100, digits = 1))
)
order n rel_abund
Coleoptera 18 18.6
Diptera 5 5.2
Ephemeroptera 5 5.2
Hemiptera 1 1.0
Megaloptera 4 4.1
Odonata 27 27.8
Plecoptera 11 11.3
Trichoptera 26 26.8
Consider this

Use your new found skills to calculate richness and relative abundances for all our data sets by order and family and briefly describe your results.

Whenever we are describing results you should start by stating the general trends and patterns. The goal is to lower the cognitive burden for your audience - this means that you are doing the thinking for them and presenting the data so that even if they don’t have tables in front of them they have a good idea of what is being compared and what the major pattern is.

Once you have identified the general trends, you want to point out notable highlights. The trick here is to not get lost in the details.

Finally, look at each data set separately to identify patterns and then start making comparisons within and among sites.

Here are some questions you can ask yourself to help you think through your results:

  • are there dominant species? is there an even distribution? in both locations? only one?
  • are all taxonomic groups in both locations? how much overlap is there? does this differ depending on the taxonomic resolution you are looking at?
  • are there seasonal patterns? in one location? in both locations?
  • are there notable differences among locations? e.g. taxonomic group occurs in both locations but is really common in one location and more rare in another.

Your answer should be 5-7 sentences.

Consider this

Other than a massive table with all the results, is there a good way to visualize this data set to make patterns more apparent?

We could create a heatplot that shows all the sample sets and color codes the tiles according to relative abundance.

First, we need to combine all of our data sets into a single one.

macros <- bind_rows(rand_2210,
                    rand_2305,
                    rand_2307,
                    rand_2310,
                    rand_2407,
                    rand_2410,
                    schb_2210,
                    schb_2305,
                    schb_2410) %>%
  mutate(month = month(sample_date),
         year = year(sample_date),
         sample_bout = glue("{site_id}_{year}-{month}")) %>%
  group_by(sample_bout, order) %>%
    count() %>%
    ungroup() %>%
    group_by(sample_bout) %>%
    separate(sample_bout, into = c("site_id", "year", "month"), remove = FALSE) %>%
    mutate(rel_abund = round(n/sum(n)*100, digits = 1),
           season = case_when(sample_bout %in% c("RAND_2022-10", "RAND_2023-10", "RAND_2024-10",
                                        "SCHB_2022-11", "SCHB_2024-10") ~ "Fall",
                     sample_bout %in% c("RAND_2023-5", "SCHB_2023-5") ~ "Spring",
                     sample_bout %in% c("RAND_2023-7", "RAND_2024-7") ~ "Summer")) %>%
  unite(sample_bout, season, year, sep = " ")

Now we can create a heatplot for relative abundance by order.

ggplot(macros, aes(x = sample_bout, y = order, fill = rel_abund)) +
  geom_tile() +
  scale_fill_viridis_c(option = "E", direction = -1) +
  facet_grid(. ~ site_id, scales = "free", space = "free") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "bottom")

Now you can make the same figure by family.

macros <- bind_rows(rand_2210,
                    rand_2305,
                    rand_2307,
                    rand_2310,
                    rand_2407,
                    rand_2410,
                    schb_2210,
                    schb_2305,
                    schb_2410) %>%
  mutate(month = month(sample_date),
         year = year(sample_date),
         sample_bout = glue("{site_id}_{year}-{month}")) %>%
  group_by(sample_bout, family) %>%
    count() %>%
    ungroup() %>%
    group_by(sample_bout) %>%
    separate(sample_bout, into = c("site_id", "year", "month"), remove = FALSE) %>%
    mutate(rel_abund = round(n/sum(n)*100, digits = 1),
           season = case_when(sample_bout %in% c("RAND_2022-10", "RAND_2023-10", "RAND_2024-10",
                                        "SCHB_2022-11", "SCHB_2024-10") ~ "Fall",
                     sample_bout %in% c("RAND_2023-5", "SCHB_2023-5") ~ "Spring",
                     sample_bout %in% c("RAND_2023-7", "RAND_2024-7") ~ "Summer")) %>%
  unite(sample_bout, season, year, sep = " ")

ggplot(macros, aes(x = sample_bout, y = family, fill = rel_abund)) +
  geom_tile() +
  scale_fill_viridis_c(option = "E", direction = -1) +
  facet_grid(. ~ site_id, scales = "free", space = "free") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1),
        legend.position = "bottom")